Semi-automatic Building Method for a Multidimensional Affect Dictionary for a New Language
نویسندگان
چکیده
Detecting the tone or emotive content of a text message is increasingly important in many natural language processing applications. Examples of such applications are rating new books or movies or products, judging the mood of a customer e-mail and routing it accordingly, measuring reputation that a person or a product has in the blogosphere. While for the English language there exists a number of affect, emotive, opinion, or affect computer-usable lexicons for automatically processing text, other languages rarely possess these primary resources. Here we present a semi-automatic technique for quickly building a multidimensional affect lexicon for a new language. Most of the work consists of defining 44 paired affect directions (e.g. love-hate, courage-fear, . . . ) and choosing a small number of seed words for each dimension. From this initial investment, we show how a first pass affect lexicon can be created for new language, using a SVM classifier trained on a feature space produced from Latent Semantic Analysis over a large corpus in the new language. We evaluate the accuracy of placing newly found emotive words in one or more of the defined semantic dimensions. We illustrate this technique by creating an affect lexicon for French, but the techniques can be applied to any language found on the Web and for which a large quantity of text exists.
منابع مشابه
A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملUsing resources from a closely-related language to develop ASR for a very under-resourced language: a case study for iban
This paper presents our strategies for developing an automatic speech recognition system for Iban, an under-resourced language. We faced several challenges such as no pronunciation dictionary and lack of training material for building acoustic models. To overcome these problems, we proposed approaches which exploit resources from a closely-related language (Malay). We developed a semi-supervise...
متن کاملA New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery
Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...
متن کاملSemi-automatic compound nouns annotation for data integration systems
Lexical annotation is the explicit inclusion of the “meaning” of a data source element according to a lexical resource. Accuracy of semi-automatic lexical annotator tools is poor on real-world schemata due to the abundance of non-dictionary compound nouns. It follows that a large set of relationships among different schemata is discovered, including a great amount of false positive relationship...
متن کاملSemi-Automatic UNL Dictionary Generation Using WordNet.PT
The increase of Internet users all over the world and the subsequent growth of available multilingual information on the Web have brought new challenges to machine translation systems. The Universal Networking Language (UNL) is a meta-language developed for conveying linguistic expressions in order to encode websites information into a standard representation. In order to integrate Portuguese i...
متن کامل